Skip to content

review: Rerun of PR #1120 (Rascal) on 8xH100 SXM#1177

Open
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:rerun/pr1120-rascal-reproduction
Open

review: Rerun of PR #1120 (Rascal) on 8xH100 SXM#1177
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:rerun/pr1120-rascal-reproduction

Conversation

@dexhunter
Copy link
Copy Markdown

Summary

Independent rerun of PR #1120 (Rascal, val_bpb 1.1099) on 8xH100 SXM (GCP).

Ran the submitted train_gpt.py from commit 39ed402 with SKIP_GPTQ=1, as specified in PR #1120's README reproduction instructions.

Rerun Result

Metric Published (seed 300) Rerun (seed 1337) Delta
final_sliding_window_exact val_bpb 1.10979 1.11350 +0.00371
final_sliding_window_exact val_loss 1.87383 1.88010 +0.00627
Steps 6,593 6,881 +288
step_avg ~91 ms 87.2 ms −3.8 ms

The rerun val_bpb is +0.00371 worse than the published seed 300 result. This gap is approximately 7× typical seed variance (~0.0005 std) and 17× the published 3-seed std (0.00021).

Environment

  • Hardware: 8× H100 80GB SXM (GCP a3-highgpu-8g)
  • Driver: 565.57.01
  • Python: 3.12.13
  • PyTorch: 2.9.1+cu128
  • NCCL_NET: Socket (required on GCP)
  • Command: NCCL_NET=Socket SKIP_GPTQ=1 torchrun --standalone --nproc_per_node=8 train_gpt.py

Observations

  1. The rerun achieves more training steps (6,881 vs 6,593) due to a faster step time (87.2 ms vs ~91 ms), yet the final result is significantly worse.

  2. The submitted train_gpt.py does not contain quantization code. It outputs final_model.pt (raw state dict) and computes final_sliding_window_exact on the unquantized model. The int6+zstd quantization and final_int6_roundtrip metrics visible in the published seed logs appear to be produced by an external runner rather than by train_gpt.py itself.

  3. The reported 3-seed metric (val_bpb 1.1099) corresponds to final_sliding_window_exact, which is measured on the pre-quantization model.

Files

  • RERUN_NOTES.md — detailed notes
  • RERUN_seed1337.log — full rerun output log

This rerun is provided for community transparency, following the precedent of PR #1126 (rerun of PR #1089).

Ran the submitted train_gpt.py (commit 39ed402) with SKIP_GPTQ=1 on GCP 8xH100.
Result: final_sliding_window_exact val_bpb 1.11350 vs published 1.10979 (seed 300).
Gap: +0.00371 BPP — 7x larger than typical seed variance (~0.0005).

Note: train_gpt.py contains no quantization code; the published int6+zstd
metrics appear to come from an external runner.
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 31, 2026
… script

The 2159-line rascal_master (no quantization) was mistakenly committed to
records/ instead of the 2468-line script that produced the submission logs.
The correct file includes int6+zstd quantization, GPTQ skeleton, and zstandard
compression — matching bytes_code=118521 reported in submission.json and logs.

Addresses reproducibility concern raised in PR openai#1177.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@newjordan
Copy link
Copy Markdown

newjordan commented Mar 31, 2026

sorry man, my agent had replaced the file in git when i was doing optimizations last night. I re-uploaded the proper file. I got my hands in three tests at any given time and it gets messy in my lab.

Ive been workign on model quality not wind down, so I had chopped the wind down for my testing. It shoudl not have been pushed.

If it will make you feel any better, have an agent scrape my notes and ablations from yesterday and you will have a bunch more data =) i'm working in the open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants